Efficient sequential and parallel algorithms for record linkage
نویسندگان
چکیده
منابع مشابه
Efficient sequential and parallel algorithms for record linkage
BACKGROUND AND OBJECTIVE Integrating data from multiple sources is a crucial and challenging problem. Even though there exist numerous algorithms for record linkage or deduplication, they suffer from either large time needs or restrictions on the number of datasets that they can integrate. In this paper we report efficient sequential and parallel algorithms for record linkage which handle any n...
متن کاملEfficient Record Linkage Algorithms Using Complete Linkage Clustering.
Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records....
متن کاملSummarization Algorithms for Record Linkage
Record linkage has received significant attention in recent years due to the plethora of data sources that have to be integrated to facilitate data analyses. In several cases, such an integration involves disparate data sources containing huge volumes of records and must be performed in near real-time in order to support critical applications. In this paper, we propose the first summarization a...
متن کاملEfficient Sequential and Parallel Algorithms for Maximal Bipartite Sets
A maximal bipartite set (MBS) in an undirected graph G = (V;E) is a maximal collection of vertices B V whose induced subgraph is bipartite. In this paper we present efficient sequential (linear time) and parallel (NC) algorithms for constructing an MBS.
متن کاملEfficient Sequential and Parallel Algorithms for the Negative Cycle Problem
We present here an algorithm for detecting (and outputting, if exists) a negative cycle in an n-vertex planar digraph G with real edge weights. Its running time ranges from O(n) up to O(n log n) as a certain topological measure of G varies from 1 up to Θ(n). Moreover, an efficient CREW PRAM implementation is given. Our algorithm applies also to digraphs whose genus γ is o(n).
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of the American Medical Informatics Association
سال: 2014
ISSN: 1067-5027,1527-974X
DOI: 10.1136/amiajnl-2013-002034